perm filename RAND[TLK,DBL] blob
sn#198231 filedate 1976-01-25 generic text, type T, neo UTF8
.COMMENT !XGPCOMMANDS←"/PMAR=2200";
.DEVICE XGP
.FONT 1 "BASL30"
.FONT 2 "BASB30"
.FONT 4 "BASI30"
.FONT 5 "BDR40"
.FONT 6 "NGR25"
.FONT 7 "NGR20"
.FONT 8 "SUP"
.FONT 9 "SUB"
.TURN ON "↑α↓_π[]{"
.TURN ON "⊗" FOR "%"
.TURN ON "@" FOR "%"
.PAGE FRAME 54 HIGH 83 WIDE
.AREA TEXT LINES 1 TO 53
.COUNT PAGE PRINTING "1"
.TABBREAK
.ODDLEFTBORDER←EVENLEFTBORDER←1000
.AT "ffi" ⊂ IF THISFONT ≤ 4 THEN "≠" ELSE "fαfαi" ⊃;
.AT "ffl" ⊂ IF THISFONT ≤ 4 THEN "α∞" ELSE "fαfαl" ⊃;
.AT "ff" ⊂ IF THISFONT ≤ 4 THEN "≥" ELSE "fαf" ⊃;
.AT "fi" ⊂ IF THISFONT ≤ 4 THEN "α≡" ELSE "fαi" ⊃;
.AT "fl" ⊂ IF THISFONT ≤ 4 THEN "∨" ELSE "fαl" ⊃;
.SELECT 1
.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
.MACRO E ⊂ APART END ⊃
.MACRO FAD ⊂ FILL ADJUST DOUBLE SPACE PREFACE 2 ⊃
.MACRO FAS ⊂ FILL ADJUST SINGLE SPACE PREFACE 1 ⊃
.FAS
.COMPACT
.SELECT 1
.PORTION THESIS
.PAGE←0
.NEXT PAGE
.INDENT 0
.TURN OFF "{∞→}"
.BEGIN CENTER
⊗5↓_Automating the Discovery of Scientific Concepts_↓⊗*
Text for a seminar at RAND, Friday, Feb. 13, 1976, 10:00am
⊗2Douglas B. Lenat⊗*
Artificial Intelligence Lab
Stanford University
.END
When we hear of some highly-lauded new discovery in our field, a
frequent reaction is "Why didn't I think of that? It's so obvious."
Even worse are the times we ⊗4did⊗* consider it, but dismissed it as
not worth pursuing. Discoveries are hard to make, but once done,
they often seem obvious. In a minute I'll give one possible
explanation for that phenomenon, by proposing a simple model for how
new concepts are formed in our minds.
In case you were hoping for some fascinating new insight into
cognition, you may be disappointed to learn that the model I'm
talking about is the standard paradigm of ⊗4heuristic search⊗*.
A computer program has been developed, embodying that model, aimed at
formulating new mathematical theories. The fact that I'm here talking
to you should indicate that it has had some measure of success.
Before we consider how to create new theories, let's focus in on how
we might explain or account for some particular discovery. As our
first example, consider the concept of prime numbers. How could
someone rationally have discovered that notion? Can we somehow
analyze their discovery?
Suppose you already know about counting and arithmetic, and you've
just been playing with the notion of "divisors-of" of a number. One
rule of thumb that you probably follow when devising new theories is
to consider extreme cases. In more formal terms, what we are saying
is that if we've just been studying an interesting function which
takes elements of set A into those of set B, and there is some
extremal subset of B, it's worth our time to consider which elements
from A map into that extremal set. Even more abstractly, one could
say that the inverse image of an unusual set is often unusual. If
our sets A and B are the natural numbers, and the function is
"number-of-divisors-of", then an extremal set of numbers might be the
set (1,2) of the smallest numbers, and their inverse image is simply
those numbers having at most 2 distinct divisors -- i.e., the primes.
So we have reduced our task from "how could you possibly discover
primeness" to the somewhat simpler task of "how could you ever
discover the concept of divisors-of". We have turned a very hard
problem into a hard problem. We did it by citing a heuristic, a
rule-of-thumb which is certainly not guaranteed to work in all cases,
but which is worth remembering and using because it frequently leads
to worthwhile new concepts.
Getting back to our task here, suppose we knew how to count the
number of elements in a set, and how to take the cross-product of two
sets. Then we might display our knowledge pictorially thus, and a
quite natural question is "what is this missing operation?". We can
compute it easily, at least in this direction. For example, if we
consider the two numbers 7 and 8, then our first step is to find two
sets with 7 and 8 elements, respectively. Then take their
cross-product, then count how many members there are. We find 56.
Sure enough, this operation is multiplication, and its inverse, this
relation, is divisors-of.
So we could draw a chain of reductions, from primes to divisors-of to
multiplication to counting and cross-product. Each step is justified
by some heuristic, here it is "consider extremals", here it is
"consider the inverse of an interesting operation", here it is
"complete the square diagram". This analysis could be carried out
indefinitely, but after a while the concepts involved get so
primitive that "discovering" them makes little sense.
Very often, a new discovery is really just one step in this kind of
process, just one heuristic application away from what is already
known. That is why it's often easy for us to believe that there
wasn't really so much effort needed to make the discovery. If we
already know this concept, and this heuristic, it almost seems
obvious that someone would make this step.
Why then don't we make discoveries continuously, every few minutes?
Why are even these 1-step advances worth publishing? The answer can
be found by stating explicitly how we think discoveries are made.
We're talking about reversing the flow of time here. Instead of
analyzing a discovery by breaking it into simpler and simpler ones,
we're now progressing in this direction, trying to grow new nodes on
this tree, using whatever heuristics are available.
The researcher has a bunch of concepts that he knows about, say a few
thousand, and a bunch of heuristics he can apply, say several
hundred. In any situation, then, he may literally have a million
alternatives to consider, all of them plausible avenues for
investigation. But nature is not kind to us, and only a very small
percentage of those new concepts will turn out to be interesting.
What is even worse, you can't whiz through this space, stopping at
the interesting ones, because it may require months of effort to
decide whether the concept you're studying is worthwhile or a
dead-end.
Let's take our tiny math example again, where we have just a few
concepts and heuristics. Even this tree becomes bushy when we try to
go in this direction. We could apply "looking at inverse" here, to
decide to investigate the kind of operation we normally call
projection. Or here: we could get bogged down in the morass of
examining numbers which have very many, rather than very few,
divisors. It's not clear that primes are at all interesting until
you've investigated them thoroughly enough to conjecture some
relationships between them and other concepts.
We can sum up this argument by saying that synthesis of new concepts
is a combinatorially explosive process, while analysis is not. That
is why it is harder to make new discoveries than to think up a
plausible scenario for how a given concept was discovered.
I'm saying that theory formation can be considered a heuristic
search: That is, a continual expansion of a tree of concepts and
relationships, guided by some judgmental criteria, some heuristic
rules of thumb.
If this is true, one should be able to write a computer program which
proposes new concepts in some domain. After picking the domain in
which the system will work, one must determine the initial core of
knowledge that it will start with, and collect the heuristics that
experts in that field use when trying to form theories. Let's go
through this development step by step.
I chose to have my system try to do research in mathematics, partly
because I have some experience in that activity, and could therefore
extract the needed heuristics by introspection,
instead of having to elicit them from other people.
Also, math is the
only natural science which doesn't have to contend with errorful
laboratory data. There are some other reasons, which I'll get into
later if you're interested.
So my first step was to list all the concepts I would allow the
system to know about. That is, a bunch of primitive notions which are
considered already-known by the system. This will be the starting
state of knowledge. But how can one decide which concepts to
include, and which ones not to? Three kinds of sets come to mind.
The first is a ⊗4complete core of knowledge⊗*,
where there are enough concepts to
derive all of past and future mathematics. Well, that is clearly
ill-defined,
there's no way to be sure that your choice of basic concepts is complete.
At the other extreme I considered starting the system off with a
⊗4minimal⊗* repertoire of concepts, meaning that if you removed any one of them,
it could never be re-derived from the remaining ones. This core was so small
it seemed almost meaningless; the truly fundamental ideas were so few that the
system would be working in a barren field for a long time. A third
kind of starting set came to mind, one which included just those concepts
which young children seem to have, which Piaget might have called
⊗4pre-numerical⊗*. These included static structural concepts like
sets and relations, and active ones like substitution, union, reversal, and
the composition of relations.
In all, about 100 concepts were called for. This is the set I
actually chose to let the system begin with. Notice that simply
executing some of the active concepts can produce new concepts. Say
one of our starting concepts is Compose-2-relations. Then Composing
the two relations Intersection and Complement, we can get the
relation Intersection(x, Complement(y)) which is just
Set-difference(x,y). If this operation weren't in our core of
concepts, it could then be added. Here are some of the concepts that
AM actually started with: <inital core slide>.
In addition to specifying the initial concepts, we must collect the
heuristics that the system will be guided by. One way to do this is
to imagine the system making various discoveries, and deciding what
rules it will need to know. Another, somewhat fairer way is to get
experts to introspect on all the clues they use, all the knowledge
that guides their research efforts. Unfortunately, many of these
heuristics are demon-like, not even occurring to him except in
situations where they might be used. So it would be best to prepare
some exhaustive questionaire for the experts, to systematically put
them in all conceivable situations, while they are introspecting.
We'll return to how to do this in a minute.
Some of the heuristics suggest new concepts to look at (like "look at
the inverse"). Others gave hints for ⊗4when⊗* to do things ("if you
can't find any examples of when a predicate is satisfied, then try to
generalize it"). Still others told ⊗4how⊗* to do something ("to fill
in examples of A, see what operators have a range which intersects
with A, and try applying them"). Finally, there are heuristics for
filling in new heuristics (like "after using the `extremals'
heuristic to derive a new subset S of A, then add the heuristics that
` A is a good set to use when testing conjectures involving f and
f↑-↑1. ' ").
The final task, at this level of detail, was to decide precisely what
informaton I would supply the system about each of these concepts.
Certainly we should provide the definition or some other definite way
of specifying each concept. Several alternative names would be nice.
Maybe we'll want to give examples of each... no, the system should be
able to find those. For operations, we can mention the doamin and
range. Maybe we'll point to some more general or specialized concept.
Perhaps we know of some abstract representation in which this concept
can be quickly manipulated analogically (e.g., Venn diagrams for the
concept of Sets). Also, to help the system decide what to do, it
would be nice to supply some kind of rough judgment of the worth of
this concept, its interest, its usefulness, its simplicity, etc.
So now we've indicated the starting concepts listed the heuristics.
How do we actually implement this as a computer program? What is
left to do? We must make precise our representations and algorithms.
How is a concept to be represented? How are the heuristics stored?
How are they used? What is the global control structure of the
system?
Let's turn to the heuristics. What requirements can we put on them?
Well, it would be nice to have each one come to the system's
attention only when it plausibly could be used. Sometimes, there
will be special information that just pertains to one kind of
concept. For example, some of the heuristics tell whether a given
composition is interesting or not. There's no reason to access them
unless we're trying to judge a composition. In fact, by looking at
the heuristics, it seemed to me that only a few very weak ones are
truly "global". ALmost every single rule of thumb can be identified
with a particular concept (and all specializations of that concept).
It's almot as though the relevant heuristics were just another facet
of each concept, like domain/range, examples, and definition. We can
even break the heuristics down into finer categories, for each
concept C, like heuristics that deal with checking a C, heuristics
that deal with creating a new C, those that deal with judging how
interesting a particular instance of C is, and so on.
Hmmm. That would be a nice way to think of things, since it provides
us with the framework we talked about to pick the experts' brains, a
kind of "questionaire" structuring. We just get them to think of
heuristics for each particular facet of each concept in turn.
There's no reason to put the resulting little bundles of heuristics
together; just leave them attached to the concepts' facets. Whenever
some concept C is the center of attention, the only heuristics that
the system need worry about are those attached to C, or some
generalization of C, and probably only those attached to the relevant
facets of C that the system is working on.
But what if we're thinking about performing an operation, like
Composing two relations R1 and R2. Then the heuristics from all 3
concepts have to work together. How can we arrange that? Well, a
reasonable Artificial Intelligence answer is: give them all the same
representation, let them be simple cooperating knowledge sources. For
example, they could all be predicate calculus statements, and
combining them would mean conjoining them. But that would involve
logical manipulations to conclude anything. Or they could all be
productions, and combining them would mean simply dumping them
together into one production system. I like that image better. We
still have to decide what the left and right hand sides of each
production rule will look like. But let that go for now.
Let me go back to how to implement the concepts. We may as well make
their facets explicit, say as properties on a property list. Hmmm.
Some of them are pretty trivial; domain/range should be just a
properly-formatted, static bit of data. But some are very
sophisticated; the algorithms part for instance. They will have to
be procedural, little programs. Well that's all right.
What does the system actually do, now? It expands its knowledge, both
by creating new concepts and by finding out more about the
already-known ones. That means it creates new data structures like
these, and it fills out the properties of existing ones. In general,
the creation of a new one is directly suggested while filling in some
detail somewhere. So the only real top-level activity need be: pick
some facet of some concept, and try to find some new knowledge to put
in that slot. During this process, some new concepts may be created.
What does it mean for a new one to be created? Well, whoever created
it should probably know its definition, and something about its
worth. Probably also how it connects to some existing concepts. Once
it's creatd, most of its facets will be empty, so there will be many
new things for the system to do. The space of possible actions will
grow very rapidly.
For those facets which are procedures, filling them in will mean
writing new little programs. For example, there should be a way to
find an efficient algorithm for doing multiplication once the system
discovered it. Well, we know enough about automatic programming to
give the system those kinds of abilities. The knowledge stored under
the Algorithms concept, for example, will be automatic programming
techniques, which can use, say, the definition of any concept to
produce an executable algorithm for it.
But there is an important gap in what I've told you, namely: how does
the system decide what to do next! This was the same problem we
discussed earlier, where there might be a million possible things to
do at each moment.
The answer is that the system maintains a job-list, a sequence of
suggested activities to do, an ordered list of candidates for its
attention. For example, at one moment, there might be a hundred
entries, indicating that the system could Generalize the relation
Object-equality somehow, check the examples of Compose that were just
filled in, etc. What we now have to decide is precisely how
candidates get entered onto the job-list, and how it gets ordered.
Perhaps the most surprising design aspect of the system is that the
primitive scheme I implemented on what I thought was a temproary
basis works quite adequately. Each candidate job gets entered by a
heuristic, for example this one is suggested by a heuristic which
recognized that there were too few random pairs of objects that
turned out to be equal. At the moment it is entered, whoever
suggested it should know some pretty definite reasons for why it
might be worth doing. So he attaches a list of reasons to it, and
assigns it a number from 0 to 1000. If the job was already in the
job-list, the new reasons are added, and the numeric value is
increased to the square-root of the sums of the squares of old and
new numbers.
There is a global level of "tolerance", which changes dynamically,
and no job is remembered if its value is below this threshhold.
Similarly, there is a higher threshhold which represents the lowest
number a job can have and actaully be executed by the system. If no
job has a high enough rating, the system pauses and spends a minute
or two trying to dredge up new jobs. All the jobs with ratings
between these two threshholds are kept around just in case the
executing-threshhold decreases.
The assumption that it makes sense to assign a global value to each
candidate is quite dangerous, and leads into a very controversial
aspect of my system: the fact that there has to be a rudimentary
calculus of interestingness. For example, the heuristic that
suggested this generalization had to possess a formula which said
something like "the numeric worth of this job (generalizing =) is
estimated to be 300 + 0.5*(value of =) + 10 * (the ratio of the
number of things which turned out to be not-= to those which were =).
That is, we are assuming that there is an easily-derivable
self-consistent set of computational rules for manipulating the
estimated worth of various concepts. Every concept must have some
numeric description of its worth, actually a vector of numbers
describing it along several dimensions; every heuristic has some
worth assigned to it, and has a formula for guessing the worth of any
job it suggests or new concept it proposes.
Predicate calculus tries to preserve and manipulate validity, and
differential calculus lets you make transformations while maintaining
equality, but this calculus tries only to preserve interestingness or
estimated worth.
Well, in any event, that was how AM was designed and created. You now know
all the essentials of its representations and algorithms.
Step back for a moment and look over what's been done.
Notice how little of this whole discussion is specific to doing math research.
The design for the concept-grower can be used to realize
almost any kind of theory formation system for a hard empirical science.
The only change from one program to another would be the particular
starting concepts, the particular facets each concept could have, and
of course the individual heuristics would change. So the design of the
system could be the same. Perhaps if several such systems were created,
they could be run together, and cooperate.
There is one more assumption which you might have missed in all this, namely that
mathematics is an empirical science.
This is something which is hard to believe unless you've ever done some
math research. You quickly find that it has very little to do with the
smooth flowing proofs in textbooks. It's more like physics, where you
gather some data, look for regularities, and induce some conjectures.
But often they can't be phrased concisely with the existing
terminology. This is a clue that some new definitions should be made.
Or perhaps some concept is too rigid, too narrow to permit some conjecture
to hold. Then we can modify it, generalize or relax it in some way.
So if anything, math research proceeds in almost the opposite order from
the way it is finally reported in books and journals.
Now let's see what lessons we can learn from my system, that apply to
programs you might be working on.
The first observation is that if the heuristic operators are sophisticated
enough, then a simple numeric kind of calculus is all the meat-level
heuristics you need for adequate performance. And you don't need any
explicit meta-meta-heuristics at all.
It would be interesting to see if that is true in some sense for other
tasks as well.
So let me mention some of the problems I had programming the system.
The first
problem I had was naming it. I wanted to call it
SAM, for Semi-Automated-Mathematician, but someone beat me to that name.
Another suggestion was ACE, for A Cycle Eater, but I modestly decided to call
it the Automated Mathematician, AM for short.
A more serious problem
we all face is how to cope with changes in the system's knowledge
base. The standard solution is to track down all the effects of each change, and
update the whole data base. The solution AM uses, and perhaps people too, is to
just record the changed information, and ⊗4not⊗* to update everything.
When a contradiction of some kind occurs, then the conflicting knowledge
is checked, the knowledge ⊗4it⊗* was based on is checked,
and so on, until we encounter
some changed opinion which explains the disagreement.
So the system is allowed to hold contradictory
viewpoints, so long as it could resolve any dispute if it had to.
This is one solution you might keep in mind for your knowledge-based systems.
Of course ⊗4the⊗* standard problem in our business is the tremendous gap
between English prose that sounds plausible, that will convince a human,
and LISP code that
will run on a PDP10.
When talking to you, it is fine for me to say that
a Composition is interesting if
its domain and range sets coincide;
but for AM, I have to analyze the affect numerically,
I have to specify precisely how to calculate the interest value of the
resultant composition, say based on the interest values for the domain and
range sets, for the two constituent operations, for composition in general, etc.
The precision I need to instruct the machine still catches me unaware sometimes,
and I find myself underestimating the time I'll need to program some idea
I was sure would be a cinch.
This problem can be summed up as "Just because you can talk about something doesn't
mean you understand it".
There is no solution to this. It is one reason that we actually write these
programs.
It implies that we can talk about more than
we really understand.
Which suggests to me that it's about time for me to stop talking.